Multi-tenant Buzz relay: community_id as a server-resolved key (comprehensive rewrite)#1321
Draft
tlongwell-block wants to merge 43 commits into
Draft
Multi-tenant Buzz relay: community_id as a server-resolved key (comprehensive rewrite)#1321tlongwell-block wants to merge 43 commits into
tlongwell-block wants to merge 43 commits into
Conversation
…fence buzz-core gets the zero-I/O tenant identity types every scoped layer shares. TenantContext encodes conformance row-zero in the type system: no Default, no Deserialize, no public constructor except resolved(), which is meant to be called only from host resolution. Downstream code holds &TenantContext and can read but not mint a community, so client-chosen-community cannot type-check outside resolution. Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The frozen base for the multi-tenant rewrite. Consolidated 0001 schema makes community_id a first-class, server-resolved key on every scoped row, mapped table-by-table to docs/multi-tenant-conformance.md. Schema highlights: - channels PK is (community_id, id): the same channel UUID may legitimately co-exist in two communities; child FKs (channel_members, workflows, thread_metadata) are composite (community_id, channel_id) so a child can never reference a cross-community channel — DB-enforced, not by handler discipline. channels.community_id is immutable (BEFORE UPDATE trigger). - communities.host uniqueness is UNIQUE(lower(host)); normalize_host applies the same rule on the resolution side, so case/dot/default-port variants can never split one tenant into two. - every scoped unique/PK leads with community_id; cross-community dedup of the same signed event is allowed, within-community dup rejected. - new tables: communities (host map), scheduled_workflow_fires (the cron at-most-once claim), audit_log (per-community chain), and an explicit _operator_global_tables registry the migration lint reads. buzz-core: - normalize_host(host): the one shared host-canonicalization rule. - TenantContext fence doc corrected to say plainly it is a lint-and-review fence, not a compiler fence (resolved()/from_uuid are pub) — honest about the guarantee the API actually gives. Schema proven against Postgres with an adversarial fence suite (re-tenant rejected, cross-community FKs rejected, same-UUID/same-event cross-community allowed, host-case collision rejected). buzz-core: 189 tests + 2 doctests green. Folds in review round 1 from Mari (channel global-uniqueness leak, host normalization, fence-claim honesty) and Sami (NIP-98 localhost normalization to be dropped in the auth lane). Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Closes the last Lane-0 schema items before the frozen base:
- events.search_tsv TSVECTOR GENERATED ALWAYS AS to_tsvector('simple',
content) STORED + GIN idx_events_search_tsv. The Typesense->Postgres FTS
data shape, landed in Lane 0 because it touches the just-locked events
table (Quinn option A). GENERATED ALWAYS = single source of truth: proven
against PG that a client cannot forge search_tsv out of sync with content
(generated_always rejection). Index left minimal single-column GIN; the
search lane picks the final spelling after EXPLAIN (Max's caveat).
- Delete stale 0002_backfill_d_tag.sql / 0003_event_reminders.sql. In the
consolidated-from-scratch model 0001 already carries d_tag, not_before,
delivered_at, and idx_events_not_before; re-running the old additive
migrations would error (duplicate column / duplicate index name).
audit_log DDL shape confirmed for the audit-crate collapse (Dawn's lane):
PRIMARY KEY (community_id, seq), UNIQUE (community_id, hash), community_id
NOT NULL on every row. 0001 is the single source; buzz-audit drops its own
schema.rs / AUDIT_SCHEMA_SQL / ensure_schema() in the audit lane.
Re-proven against real Postgres — full fence suite green: T1 re-tenant
rejected, T6 cross-community member FK rejected, T6b same-community ok, T7
same channel UUID in two communities allowed, T8 host case-collision
rejected, T9 same event id in two communities allowed, plus the FTS
generated+GIN match and the forge-rejection. buzz-core: 189 + 2 doctests.
Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Add EventQuery::for_community so relay call sites can keep concise struct updates without restoring a tenantless Default. The constructor requires the server-resolved CommunityId and preserves the old optional filter defaults everywhere else. Return the owning community host from the ephemeral-channel reaper by joining communities in the archive UPDATE. Reaper consumers can now build TenantContext per archived row from DB-resolved community+host instead of hoisting or forging a batch-level tenant. Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
RateLimiter::check_and_increment now takes &TenantContext, and
rate_limit_key emits buzz:{community}:ratelimit:{pubkey_hex}:{suffix}.
Same pubkey active in two communities consumes two independent quotas,
matching the S1 cross-community isolation fence in the buzz-relay
rewrite spec.
check_ip_connection stays operator-global by design. The IP fence runs
at connection acceptance, before host->community resolution has
completed (or, on resolve failure, instead of it). Threading
&TenantContext through it would invert the order of operations. Per-
(community, IP) caps, if ever needed as a tenant-fairness signal,
belong in an additive LimitType keyed on (community, ip) — not in this
trait.
RedisRateLimiter in buzz-pubsub follows the new trait signature.
AlwaysAllowRateLimiter test impl mirrors it. Two new tests pin the
behavior: the key includes the community prefix, and same-pubkey-two-
communities yields two distinct Redis keys.
Local cargo test -p buzz-auth: 36 passed. Local cargo test -p
buzz-pubsub: 3 passed, 6 Redis-required ignored. Workspace-wide check
not run locally (sqlx 0.9.0 requires rustc 1.94, local toolchain is
1.89 — same constraint Max hit on the pubsub lane); relying on CI for
the full integration compile.
(cherry picked from commit 6a92f0b)
Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Adds the §5 pre-build gate for multi-tenant replay protection.
buzz-auth gains a Nip98ReplayGuard trait plus the
nip98_replay_key(ctx, event_id) helper. The trait's try_mark contract
requires atomic set-if-absent semantics; an in-process cache (moka,
DashMap) does not carry the freshness proof across pods under the
"any pod, any connection" architecture (§4B), so the production
implementation MUST be shared state. The Redis-backed impl lives in
buzz-pubsub as RedisNip98ReplayGuard and uses a single SET key 1 NX
EX <ttl> per claim.
Key shape: buzz:{community}:nip98:{event_id_hex}. Event ids are
content-addressed so natural cross-community collision is zero, but
the gate is fail-closed isolation — a same-id replay across
communities must consult two distinct seen-set rows, not one shared
row. Tests pin both the prefix and the cross-community isolation
guarantee.
TTL floor is DEFAULT_REPLAY_TTL_SECS = 120, matching the §5 gate
requirement and the doubled NIP-98 ±60s timestamp tolerance.
Implementations MAY clamp sub-floor TTLs up to the floor; they MUST
NOT honor smaller values. The Redis impl clamps.
Caller contract documented in the trait: verify first, then mark.
Burning a seen-set slot on a forgery would let an attacker who learns
a future event id DoS the legitimate event. On Err (Redis
unreachable) callers MUST fail closed.
Not wired into a call site in this commit — there is no NIP-98 HTTP
handler in Lane 0 yet. Eva's relay-wiring lane will consume the trait
when the HTTP path lands; the contract is documented for that
integration.
Validation:
- cargo test -p buzz-auth --lib ✅ 40 passed (4 new in nip98_replay).
- cargo test -p buzz-pubsub --lib ✅ 3 passed, 9 Redis-required
ignored (3 new in nip98_replay).
- cargo test -p buzz-pubsub --lib nip98_replay -- --ignored against
local Redis ✅ 3 passed: first-claim/replay, cross-community
isolation, sub-floor TTL lifted to floor.
- Workspace check not run locally (sqlx 0.9.0 / rustc 1.94 vs local
1.89); CI catches it.
(cherry picked from commit a2a9ef4)
Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…acing Red-team pass against the auth lane surfaced one real bug and two robustness gaps. All three caught by tests, the bug verified by temporarily reverting the fix and watching the test fail with the real Redis error. 1. Real bug: a caller passing ttl_secs > i64::MAX (e.g. u64::MAX from a config bug) caused Redis to return "ResponseError: value is not an integer or out of range" from `SET NX EX <ttl>`. RedisNip98ReplayGuard then returned Err, the trait contract forces callers to fail closed, and every NIP-98-gated request from that point would have errored with no visible link back to the bad config. Fix: introduce MAX_REPLAY_TTL_SECS (1 hour — 30× the natural physical maximum, well inside i64::MAX) and clamp ttl_secs into [DEFAULT, MAX] before the SET. New ignored-Redis test `above_ceiling_ttl_is_clamped` exercises the path with u64::MAX and asserts the claim+replay sequence succeeds, which it only does with the clamp. 2. Robustness: pin "all rate-limit and replay key components are lowercase ASCII" as a unit-level invariant. If pubkey::to_hex, Uuid::Display, or LimitType::key_suffix ever started emitting uppercase, the same logical (community, pubkey/event_id) would map to two distinct Redis keys — silently doubling the rate-limit quota or breaking the seen-set's identity. Two new tests (`rate_limit_key_components_are_lowercase`, `key_components_are_lowercase`) catch the regression in CI rather than production. 3. Robustness: structured tracing on every Redis failure path with `community = %ctx.community()` as a structured field, so ops can group log alerts by tenant without needing the community id to be embedded in the AuthError string. The user-facing AuthError::Internal payload stays the existing convention (consistent with rate_limit.rs neighbors); the per-tenant context lives in tracing fields, not in the error string. Also: add `ttl_floor_below_ceiling` and `max_ttl_fits_in_redis_signed_ex` unit tests so the two TTL constants can't drift past each other or above Redis's signed-EX limit in a future edit. Out of scope for this lane (flagged to other lane owners): - AuthError::Internal generally embeds raw downstream error strings (existing pattern across rate_limit.rs and nip98_replay.rs). Could leak community/tenant identifiers if those strings ever surface to clients. Audit lane (Quinn) owns the error-message safety rule per Eva's [6] lane split. - check_ip_connection MUST be called before host resolution / on every connection (including failed-host-resolution attempts). Otherwise an attacker who picks a non-matching host header bypasses the IP cap. Wiring lives in the relay-wiring lane (Eva). Validation: - cargo test -p buzz-auth --lib: 44 passed (4 new red-team tests). - cargo test -p buzz-pubsub --lib: 3 passed, 10 Redis-required ignored. - cargo test -p buzz-pubsub --lib nip98_replay -- --ignored against local Redis: 4 passed (1 new ceiling-clamp test). - Bug verified: with the clamp temporarily reverted, the above_ceiling_ttl_is_clamped test fails with the real Redis error "value is not an integer or out of range" — proving the test catches the regression, not just the fix. (cherry picked from commit f54d728) Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two adversarially-proven multi-tenant fences for the auth lane on the frozen Lane 0 SHA: 1. NIP-98 verifier: drop loopback aliasing unconditionally. normalize_url() collapsed localhost / ::1 -> 127.0.0.1 — a testing convenience that becomes a row-zero side door under multi-tenant. The u-tag host is the community binding (docs/multi-tenant-conformance.md, NIP-98 row); collapsing the three would let an event signed for localhost pass against a 127.0.0.1-resolved community (or vice versa). Inverted the localhost test to bite the new strict rule: signed-for-one vs expected-other now REJECTS, identity still passes. Adversarial: re-introduced the aliasing -> test goes red -> restored. 2. ChannelAccessChecker: thread &TenantContext through every method. Frozen 0001 has channels PK (community_id, id), so the same UUID legitimately co-exists across communities. A bare WHERE id = implementation would be a cross-community existence oracle. Mirror of buzz-db rule 4a.1 on the auth side. MockAccessChecker keyed on (community, pubkey, channel_id); new test access_does_not_cross_communities bites the bare-id direction. Adversarial: dropped the community filter from the mock -> test goes red -> restored. No external impl of ChannelAccessChecker in-tree (DB uses a separate free function under Mari's lane), so the trait signature change is contained. cargo test -p buzz-auth: 45 passed / 0 failed. Lane: auth (buzz-auth). Base: e349d76 (frozen Lane 0). (cherry picked from commit 3df6179) Co-authored-by: Sami <f4a42a97e594b77bdbd8ee35191c8b28a94a4cb871d96f32921558275421fb68@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
(cherry picked from commit 69237ef) Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
(cherry picked from commit 4af7348) Co-authored-by: Max <d8473ee32b973aa31a21a65adddcc4b69cc2a8a4dee8121ecd51926e0cddbc02@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The Lane-0 freeze landed `events.search_tsv TSVECTOR GENERATED ALWAYS AS
(to_tsvector('simple', content)) STORED` + `GIN (search_tsv)` directly in
the schema. With that in place the entire Typesense apparatus is dead
weight: there is nothing to index out-of-band, no consistency window to
reason about, no client-forgeable index/content drift. Indexing is the
SQL write.
This rewrites `crates/buzz-search/` from scratch around that:
- `query.rs`: one SQL builder. `community_id = $ctx` is the first
predicate of every executed statement and is unconditional —
`SearchQuery` requires a `CommunityId` at the type level (no
construction path omits it). `search_tsv @@ websearch_to_tsquery(...)`
is the FTS predicate; `ts_rank_cd DESC, created_at DESC, id` is the
order. Channel scope replaces today's `__global__` sentinel with
`channel_id IS NULL`. Empty query short-circuits without a roundtrip.
- `lib.rs`: thin `SearchService { pool }`. Takes `&PgPool` directly so
the crate stays a leaf — no buzz-db dependency. Re-exports
`CommunityId` for callers that need to mint the fence.
- `error.rs`: collapsed to one variant (`Db(sqlx::Error)`); empty
queries are not errors.
- Deleted `collection.rs` and `index.rs` (Typesense HTTP client and
indexer). Dropped `reqwest`/`serde`/`serde_json`/`chrono`/`nostr`
from `Cargo.toml`.
- Added `tests/fts_integration.rs` — 8 integration tests against real
Postgres, each on its own throwaway schema applying the frozen
`migrations/0001_initial_schema.sql` via `include_str!`. The
load-bearing one is `search_does_not_return_other_community_events`:
mutating the `community_id = $ctx` predicate to `1=1` makes that
test go red (verified, then reverted) — the fence bites where it
has to.
Conformance row 50 — search re-auth and one-shot NIP-50 — is unchanged
in shape: the relay refetches canonical events per hit through buzz-db's
scoped fetcher and runs the access predicate. Search is never the
access boundary; this crate just returns candidate ids. The row's
Typesense prose rewrite is owned by Eva's integration lane (one writer
per path).
EXPLAIN ANALYZE evidence on a 200k-row community confirms the planner
picks `Bitmap Index Scan on events_p<...>_search_tsv_idx` for the
populated partition (full plan in RESEARCH/SEARCH_LANE_FTS_EXPLAIN.md
in the workspace). Single-column `GIN (search_tsv)` is sufficient at
this scale — no `btree_gin` needed (Max's caveat holds).
Cross-lane removals owed to Eva (relay-wiring lane, not this commit):
- relay state.rs: remove `search_index_tx` mpsc + worker
- relay main.rs: remove `search.ensure_collection()` call
- relay handlers/event.rs: remove `search_index_tx.send()`
- relay api/bridge.rs::handle_bridge_search: rewrite to new API
- relay handlers/req.rs::handle_search_req: rewrite to new API
- relay handlers/req.rs::build_search_channel_scope_filter: delete
- relay bin/reindex_kind0.rs: delete
- docker-compose.yml: drop typesense service + volume
- docs/multi-tenant-conformance.md row 50: rewrite Typesense prose
Tests: `cargo test -p buzz-search --test fts_integration --
--include-ignored --test-threads=1` — 8 passed, 0 failed.
Clippy: `cargo clippy -p buzz-search --all-targets -- -D warnings` — clean.
(cherry picked from commit e31c098)
Co-authored-by: Quinn <96f056ad5f2305c8ddf637dc65d048aa4c12d7daeb8867690e34fca46b0ef64c@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The legacy 2x2 `(channel_ids: Option<Vec<Uuid>>, include_channel_less: bool)` shape could not unambiguously express "channel-less events only" — both `Some(vec![]) + true` and `None + true` fell into the no-constraint branch, silently broadening to all community channels rather than restricting to `channel_id IS NULL`. That matched the legacy Typesense `channel_id:=__global__` sentinel one way (per-channel + global) but not the other (global only).
Replace with a single `ChannelScope` enum whose four variants are 1-to-1 with the legacy `(accessible_channels, include_global)` matrix:
- non-empty + true -> ChannelsOrChannelLess(accessible)
- non-empty + false -> Channels(accessible)
- empty + true -> ChannelLessOnly (the variant the old shape could not express)
- empty + false -> caller short-circuits to EOSE, doesn't call search
Emitted SQL fragments are byte-identical to the legacy match for the three carry-over cases; `ChannelLessOnly` adds `AND channel_id IS NULL` — the fence the old type could not express.
Verification:
- Full package `cargo test -p buzz-search -- --include-ignored --test-threads=1`: 9/9 green (8 existing + 1 new `channel_less_only_excludes_per_channel_events`).
- Adversarial mutation: replaced the `ChannelLessOnly` SQL emission with a no-op (the buggy semantic the old shape produced); new test went RED with 3 hits instead of 1, restored, green again. The fix is the emitted predicate, not the variant name.
- clippy -D warnings clean; fmt clean.
- Empty-vec edge cases are intentionally not special-cased: `Channels(vec![])` emits `channel_id = ANY('{}')` (false-for-all, zero hits, preserves the old early-return semantic via SQL); `ChannelsOrChannelLess(vec![])` is equivalent to `ChannelLessOnly`.
Coordinated with Eva ahead of relay-wiring sweep at req.rs and bridge.rs so call sites land against the final type, not the buggy one.
(cherry picked from commit c8cd333)
Co-authored-by: Quinn <96f056ad5f2305c8ddf637dc65d048aa4c12d7daeb8867690e34fca46b0ef64c@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Convert the audit log from one global hash chain to an independent per-community chain, conforming to the frozen Lane-0 0001 schema. - Collapse to one DDL: delete schema.rs / AUDIT_SCHEMA_SQL and their lib.rs exports. The 0001 migration is the sole owner of audit_log. - Chain shape: PK (community_id, seq), seq monotonic per-community, UNIQUE (community_id, hash); hash/prev_hash/actor_pubkey as BYTEA; object_id TEXT generalizes the old event_id/channel_id; detail JSONB. - community_id is folded into the SHA-256 (leads the hash) so a row cannot be lifted out of one community's chain and re-verified in another. Per-community advisory lock — communities never serialize each other's audit writes (no throughput bottleneck, no timing oracle). - verify_chain / get_entries scoped to a CommunityId. - Error variants carry only per-community seq (meaningless without its chain) — never community_id, hash values, or raw action strings. - AUTH-body protection becomes caller discipline + the AuditAction enum (AuthSuccess/AuthFailure carry outcome metadata, never the token); the dropped event_kind column is not persisted. 13/13 green (7 unit + 6 Postgres isolation). Adversarial: disabling the community_id line in compute_hash turns community_id_is_part_of_identity RED (two communities hash identically); restored to green. (cherry picked from commit ba11d66) Co-authored-by: Dawn <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Make the provenance fence visible in the type signature, not a per-call-site convention. `NewAuditEntry.community_id` becomes `CommunityId` (the server-resolved newtype) instead of a raw `Uuid`, so a wiring call site can no longer pass an arbitrary UUID off the event/channel being acted on — the only doors to a `CommunityId` are host resolution or a server-scoped DB row, never client input. The DB-row type `AuditEntry` stays `Uuid`: sqlx reads/writes it directly and `compute_hash` does `.as_bytes()` on it, so the stored hash bytes are byte-for-byte identical and the already-integrated chain stays valid — no migration, no re-hash. The `as_uuid()` dereference moves inside `AuditService::log` at the DB boundary, where the column is written; the advisory-lock key is unchanged (CommunityId's Display delegates to Uuid). Drop the now-orphaned `Serialize`/`Deserialize` derive (and the `#[serde(default)]` on `detail`) from `NewAuditEntry`: it has no serde consumer — it travels only through the in-process audit sink (mpsc), never a wire/DB boundary. Keeping it non-deserializable reinforces the fence: no client blob can mint a NewAuditEntry. Full package green (13/13, incl. the 6 PG isolation tests and the community_id_is_part_of_identity fence); clippy -D warnings + fmt clean. Adversarially verified the fence is non-vacuous: dropping community_id from compute_hash turns community_id_is_part_of_identity RED, restored. (cherry picked from commit 284cc69) Co-authored-by: Dawn (sprout agent) <c6237ef84fa537c78dcee78efd2d4e59f728859c7f194da42ac51ededfa0be05@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…ind) Conformance row zero: req.community = resolve_host(connection.host), bound before any handler observes tenant data. This lands the relay-side seam: - HostResolver trait (native async fn, no async-trait dep) — buzz-db's Db::resolve_host satisfies it; the relay depends on the trait, not the query, so the binding is testable without a database. Callers are generic over R, no dyn dispatch (the relay holds a concrete Db). - bind_community(): normalizes the host with the one shared rule, resolves it, and fails closed on BOTH unmapped host AND lookup error — there is no path that yields a default/fallback community. UnmappedHost is a distinct variant the call site turns into a GENERIC reject (no host echo, no unmapped-vs-error distinction) so an unauthenticated caller can't probe which hosts exist. - TenantContext carries the normalized host, so downstream NIP-05/audit labelling and the NIP-98 u-host check all see the canonical form the community was resolved from. Tests (4, green) cover known-host bind, variant normalization (case/dot/ default-port can't split a tenant), unmapped fail-closed, and lookup-error fail-closed-not-default. Adversarially verified: mutating the None arm to fall through to a nil default community turns unmapped_host_fails_closed RED. Seam contract for the buzz-db lane (Mari): Db::resolve_host(&self, normalized_host: &str) -> Result<Option<CommunityId>, DbError>, a SELECT id FROM communities WHERE host = $1 on the normalized key. Router call site (nip11_or_ws_handler) + threading TenantContext through handle_connection land next in this lane. (cherry picked from commit 0be8532e0e94e5ecd6529f2f3f52255dd36f6009) Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…ing (§5b)
Plan §5b, decided by Tyler: rather than sticky-route huddles or ship a
silent split-room, a horizontally-scaled deployment surfaces a clear,
client-handleable unavailable signal on huddle join.
- config: huddle_audio_available bool, env BUZZ_HUDDLE_AUDIO_AVAILABLE.
Defaults true so single-pod (N=1) deployments keep today's huddle
behavior unchanged. Operators running multiple relay pods set it false.
- audio handler: after auth + membership pass and BEFORE get_or_create joins
a room, if huddle_audio_available is false we send
{type:error, code:huddle_audio_unavailable, message:...} and return — no
silent room join whose frames never cross pods.
Why a config flag and not pod-count self-detection: the relay can't reliably
count its own pods; an explicit operator flag is the honest model and keeps
the §4 fork-B (any-pod-any-connection) generic routing free of huddle
stickiness. The real fix is the out-of-relay media/SFU service (Tyler's
long-term target), out of scope for this rewrite.
Tests: default-true (N=1 compat) and env-false-disables, both green. Full
buzz-relay --lib green at --test-threads=1 (374). Note for this lane: there
is a pre-existing parallel-run env-var race (global_presence_pubsub test
calls Config::from_env without the config tests' ENV_MUTEX guard) — not a
regression from this change; flagged to fix in the wiring lane.
(cherry picked from commit cc2bc29d4429da9e1a3e80217936340a4c1ca721)
Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co>
Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Executable form of docs/multi-tenant-conformance.md: one module per obligation-table surface row (14 surfaces, 18 isolation tests) plus the N=1 parity gate documented against the existing e2e suites. Each A/B isolation test addresses two hosts (RELAY_URL_A/RELAY_URL_B) on the SAME relay process — one binary, one Postgres, one Redis, two communities — proving no tenant-observable state crosses a boundary derived from host, never caller input. All #[ignore] (need a running two-host relay) so a normal cargo test run reports 0 passed / 18 ignored; they cannot fake-pass. Rows the lane hasn't landed yet panic via pending_lane(lane, obligation), which names the exact obligation for the owner to fill in and makes the remaining work one grep. Lane ownership tagged per module. (cherry picked from commit 9d6d35f07a17fcf5ccd8a6f20fdede3349e67024) Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
…row) The conformance obligation for the NIP-11 surface: RelayInfo::build must not grow unscoped DB/search/audit inputs, so an unauthenticated NIP-11 read can never become a cross-community enumeration oracle. Binds RelayInfo::build to its exact allowed signature via a const fn pointer. Adding a &Db / &AppState / search / audit input makes the function-pointer type stop matching and breaks the build at the fence — a silent cross-tenant leak becomes a hard compile error, deny-lint style. Adversarially proven: injecting a &AppState param into build() produces error[E0308] mismatched types at the fence const (plus E0061 at the call sites); reverted to confirm the fence, not the call sites alone, is the guard. buzz-relay package 374 green at --test-threads=1. (cherry picked from commit 76a4044c7cfb1c96a6817be1e81c7ae42d1ea3da) Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Archived identity state is tenant-local; a pubkey archived in one community must not read as archived in another. Thread CommunityId through the archived identity queries and DB wrappers, and bind the composite key used by the migration. Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Threads server-resolved `community_id`/`TenantContext` through the whole relay call graph and the operator CLI against the v3 DB/pubsub API, so every scoped row read and every Redis publish names a community the relay derived from data, never from caller input. Relay (`crates/buzz-relay`): - Read-path caches take `CommunityId`; write/invalidate publishers take `&TenantContext` (the Redis topic key needs the host). The cross-node fan-out path only has the community, so caches stay constructible there. - Doors fail closed: WS/bridge/media/NIP-05 bind community from the request host via `bind_community`, falling through to an empty/404 response on an unmapped host — no default tenant, no host echo. - Background loops get tenant from the DB row they act on: the reaper builds `TenantContext::resolved(row.community_id, row.host)` per archived channel from the reaper RETURNING; the dev/CI reconciler and reminder scheduler resolve the one configured community from `relay_url`, fail-closed. - Deployment-community cases with no connection tenant (git hook/finalize, workflow sink) resolve via the same host-resolution seam. - Drop the Typesense-only `reindex_kind0` backfill binary, obsolete under the Postgres FTS migration and referenced nowhere. Admin CLI (`crates/buzz-admin`): - New `resolve_admin_tenant` reads `RELAY_URL` host (the CLI runs `compose exec relay buzz-admin`, sharing the relay's env) and resolves it via `lookup_community_by_host`, fail-closed on an unmapped host. - Scope the NIP-43 membership-list publish (`EventTopic::Global`), channel reconcile, `get_members`, and the kind:39000 existence `EventQuery` (`..EventQuery::for_community`). Drop the now-dead `uuid` dep. Workspace gate: `cargo check --workspace` green; buzz-db 97/97, buzz-audit 13/13, buzz-relay 375 + main 1 (`--include-ignored --test-threads=1`), buzz-admin compiles, fmt + buzz-admin clippy clean. Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Relay E2E applies schema/schema.sql as declarative desired state before the relay starts. The multi-tenant migration added FKs to communities, but the snapshot did not define the table, so pgschema failed before tests ran. Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
CI Rust Lint + Windows Rust run `cargo clippy --workspace --all-targets -- -D warnings`; the community_id/tenant args pushed six fns to 8/7 and the new NIP-98 replay code tripped clamp/const-assert lints. Resolve at the bar, matching existing repo conventions: - 6x #[allow(clippy::too_many_arguments)] on the fns that gained a tenant/community arg (same convention already used across buzz-db/relay). - buzz-pubsub replay TTL: .max().min() -> .clamp() (floor 120 < ceiling 3600, cannot panic; behavior identical, incl. the u64::MAX clamp test). - buzz-auth replay const-drift tripwires: scoped #[allow(clippy::assertions_on_constants)] — the assert-on-constant IS the design (fails if someone drifts the TTL constants). Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Relay E2E builds the database from schema/schema.sql via pgschema apply, while the rewrite migration had moved the first-class community_id schema forward. The snapshot was still mostly pre-MT, so it produced unscoped tables such as channels(id) instead of channels(community_id, id). Make the declarative snapshot match migrations/0001_initial_schema.sql exactly so the schema path and migration path create the same tenant-scoped shape. Co-authored-by: Mari <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
3fb25c5 to
f1b459b
Compare
The relay resolves each connection's tenant from the durable communities host map and fails closed on an unmapped host. Under the MT schema, channels.community_id is NOT NULL with a FK to communities, so the pre-MT e2e seed (unscoped channel/member INSERTs against an empty communities table) fails, and every e2e client connection 404s at host-binding. The relay never auto-seeds a community (ensure_configured_community has no callers). Seed the deployment community (host=localhost:3000, matching RELAY_URL=ws://localhost:3000 after normalize_host keeps the non-default port) and thread community_id through the channel/member INSERTs: - setup-desktop-test-data.sh: insert the community row first, then scope every channel/member INSERT (Desktop E2E Integration). - start-relay-for-tests.sh: seed the community after schema apply (Relay E2E); psql-or-docker fallback since psql is not on PATH in hermit. - ci.yml backend-integration: seed after relay start (reconciler retries for 2min), before the NIP-ER reminder suite. ON CONFLICT targets lower(host) to match idx_communities_host, keeping the seed idempotent. Verified against live PG: schema apply clean (165 stmts), seed inserts 9 scoped channels + 19 scoped members with zero nulls, host resolves, re-run is idempotent. Adversarial: an unscoped channel INSERT fails not-null and a channel against a nonexistent community fails the FK, proving the community row is load-bearing. Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Two CI-honesty follow-ups after the first seed surfaced host/ordering mismatches in the MT e2e path (no product behavior): 1. Desktop E2E 404: the seed-readiness helper queried 127.0.0.1:3000 while the relay reconciles + the community is seeded for localhost:3000. normalize_host keeps the non-default port and 127.0.0.1 != localhost, so the inbound host resolved to no community and every /query 404'd. Default the helper to http://localhost:3000, matching the rest of the desktop e2e suite (e2eBridge.ts / bridge.ts already use localhost) and the relay's RELAY_URL. 2. Backend Integration UnmappedHost: the reminder scheduler binds the deployment community once at boot and exits permanently on an unmapped host (no retry, unlike the channel reconciler). The community was being seeded after relay start, leaving the scheduler dead. Apply the schema and seed the community BEFORE starting the relay (dropping BUZZ_AUTO_MIGRATE since the schema is now applied up front), so the scheduler binds on its single boot-time attempt. Both are test/CI wiring. The Relay E2E suite stays red on a separate, gated body-level bug (command_executor.rs inserts events without community_id) tracked for the §4 scoping slice. Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
The desktop e2e integration relay-boot still used BUZZ_AUTO_MIGRATE with no pre-boot community seed, so the channel reconciler bound the deployment community ONCE at boot (outside its retry loop) before setup-desktop-test-data.sh seeded it, hit UnmappedHost, and exited permanently. The reconciler's retry loop only covers late-seeded channels, not a late-seeded community — so the 9 seeded channels were never reconciled and 'loads channels from the relay' saw 0 channels (60s timeout). Both Desktop E2E Integration shards red. Mirror the proven backend-integration ordering: apply schema + seed the localhost:3000 community BEFORE the relay starts, and drop BUZZ_AUTO_MIGRATE (schema is now applied pre-boot). setup-desktop-test-data.sh's own idempotent community seed becomes a no-op; its channel INSERTs are then picked up by the reconciler's retry loop. Co-authored-by: Eva <011987e296fd5006292d2f930b574be47c7801048d1983c46c425d3c95f0cffd@sprout-oss.stage.blox.sqprod.co> Signed-off-by: tlongwell-block <109685178+tlongwell-block@users.noreply.github.com>
Co-authored-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The host-binding seam (Inv_RowZero) must derive a community only from a
request's actual Host. An empty raw_host carries no community evidence,
yet bind_community passed it straight to the resolver. The schema does
not forbid an `host = ''` row in communities (0001_initial_schema.sql:
`host VARCHAR(255) NOT NULL`, unique index on `lower(host)`), so a
misconfigured/empty-host row plus a request with a missing, unreadable,
or whitespace-only Host header would silently bind to that community.
Guard before the resolver lookup: if normalize_host(raw_host) is empty,
return BindError::UnmappedHost. Reuses the existing variant so the
rejection is byte-identical to any other unmapped host — an
unauthenticated caller cannot probe whether an empty-host row exists.
Red-team finding (Attack 2, MEDIUM defense-in-depth) by Sami; her three
proof tests (empty, whitespace-only, plus a non-empty negative control)
are included un-ignored. Verified mutate->red->restore: removing the
guard reds both empty-host gates with the literal fence collapse
(TenantContext{community: X, host: ""}). Full cargo test -p buzz-relay
--lib: 381 passed, 0 failed.
Co-authored-by: Sami <sami@sprout-oss.stage.blox.sqprod.co>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co> Signed-off-by: npub1jh9wn95s0472h86ahapupaf7m6kx4v9sx2n0atj2hltcfer8k06s5n3pyf <95cae996907d7cab9f5dbf43c0f53edeac6ab0b032a6feae4abfd784e467b3f5@sprout-oss.stage.blox.sqprod.co>
The H1 fanout fix threaded a server-resolved CommunityId through the connection registry and two membership caches, tipping three clippy lints: register() to 8 args (too_many_arguments) and the moka cache keys to type_complexity. Allow both locally with rationale, matching existing repo convention (observer_owner_cache already carries the same allow in this file; buzz-db/buzz-cli use too_many_arguments allows). Also relocate topic_for_subscription above the req.rs test module to clear items_after_test_module surfaced by --all-targets. No behavior change; buzz-relay 385/0, clippy --all-targets -D warnings clean. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
915eed7 to
bc453f3
Compare
Adds `crates/buzz-conformance/` — the substrate for runtime formal-spec
conformance. It is the **independent oracle** for the multi-tenant relay:
given a trace of seam events recorded by the relay at runtime, the
checker asserts they obey `docs/spec/MultiTenantRelay.tla`. Production
binaries pay zero cost (the relay defaults to `NoopTracer`); test/staging
runs against `JsonlTracer` and the checker re-runs every captured trace.
Crate contents:
- `src/lib.rs` — schema: `TraceStep`, `TraceAction` (8 spec actions +
`ImplBug` for coverage-breach), `AbstractState` (resolved_community,
bound_host, actor), the `Tracer` trait, `NoopTracer` for prod.
- `src/transitions.rs` — re-implementation of the spec's `Next` relation
in Rust, used by the checker. Owned by this crate, not pulled from the
relay — that's what makes the oracle independent.
- `src/checker.rs` — replay engine: `check_trace` returns
`Err(IllegalTransition | StateMismatch | NonInterference | CoverageBreach)`
on any departure from the spec. 9 unit tests covering each failure mode
plus the M2/M8 (`claimed != resolved`) and NI/ReadConfinement bites.
- `tests/replay_fixtures.rs` + `tests/fixtures/*.jsonl` — five tests that
reconstruct three on-disk JSONL fixtures from typed Rust, assert the
committed file matches byte-for-byte (any schema change requires
`BUZZ_CONFORMANCE_UPDATE=1` to refresh), then replay each through
`check_trace`:
- `good.jsonl` → `Ok(())`
- `bad_host_channel_mismatch.jsonl` → `IllegalTransition`
- `bad_coverage_breach.jsonl` → `CoverageBreach`
- `TRACE_SCHEMA.md` — grounds every action in its `MultiTenantRelay.tla`
line and calls out the three load-bearing projection rules.
- `LIMITS.md` — honestly describes what a green run does/doesn't prove,
and the CI command listing the test surfaces.
Production-fence discipline: deps are exactly `serde / serde_json /
thiserror / uuid`. Zero `buzz-*` production crates. `CommunityLabel(Uuid)`
is a newtype in this crate, NOT `buzz_core::CommunityId` — the checker
physically cannot inherit a production bug because it shares no code
with the relay.
Verify discipline:
- `cargo test -p buzz-conformance --lib` → 9/9
- `cargo test -p buzz-conformance --test replay_fixtures` → 5/5
- Mutate→red→restore proven three times (in earlier session): row-label
corruption → `NonInterference` fires; trace `claimed = resolved` →
`IllegalTransition` vanishes; counter threshold loosened →
`ImplBug` doesn't fire.
This commit lands the substrate only. The relay-side glue
(`crates/buzz-relay/src/conformance/{mod,tracers}.rs`, `AppState.tracer`,
`EmitGuard`) and the ingest-seam emitter follow on the next branch
(`quinn/conformance-relay-glue`). The req.rs read-seam emitters land
after.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Add `crates/buzz-relay/src/conformance/` module:
- `Tracer` re-export, `NoopTracer` (production), `JsonlTracer` (test/CI).
- `EmitGuard::arm(tracer, state, kind) → (guard, counting_tracer)`:
RAII coverage breach. The guard wraps the original tracer in a
counting layer; production callers transparently use that wrapper.
If no emit reaches the wrapper before the guard drops, the guard
emits a synthetic `ImplBug` step on the underlying tracer — the
checker treats that as CoverageBreach. The wrapper design means
production paths never need to "disarm" or pass anything around;
a future new exit that forgets to emit will be caught
automatically.
- `state_for_request(tenant, actor)`: builds AbstractState. Pulls
community + host directly from server-resolved TenantContext.
- `claimed_community_from_event`: reads the event's h tag for the
trace's `claimed_community` field — recorded SEPARATELY from
`resolved_community` so M2 (claim≠resolved) and M8 (host/channel
disagreement) bite at the checker.
- `sanitized_reason_for(&IngestError) → SanitizedReason`: 1:1 map
of IngestError variants (Rejected/AuthFailed/Internal) onto the
closed SanitizedReason alphabet (Invalid/Restricted/ServerError).
Adding a fourth IngestError variant breaks this match — CI catches
it before it ships.
Emitter wiring in `crates/buzz-relay/src/handlers/ingest.rs`:
- `ingest_event` is now a thin wrapper: arms EmitGuard, calls
`ingest_event_inner`, and on Err maps to SanitizedError. All 36
early-Err returns and 6 Ok returns in the inner fn are covered
by this single outer mapping.
- At `check_channel_membership` call site (line 1401): emits
`AuthCheck { channel, claimed_community, verdict }` with
Allow on Ok, Deny on Err. The verdict basis is `tenant.community()`
server-resolved — confirmed at ingest.rs:424's `is_member_cached`
call signature (not event-derived).
- At each `dispatch_persistent_event` call site (lines 1908, 2014):
emits `WriteInsert` (channel + was_inserted=true), `WriteDuplicate`
(channel + was_inserted=false), or `WriteInsertGlobal` (no channel).
This is the entire write side of the ingest seam.
`crates/buzz-relay/src/handlers/event.rs` `dispatch_persistent_event`:
- No emit added. Documented why inline: the spec has no separate
fan-out action — acceptance is recorded at ingest's WriteInsert;
fan-out surfaces as ReadMessageRows on the subscriber side (the
read seam in req.rs, lands in the held-back additive diff).
AppState carries `tracer: Arc<dyn buzz_conformance::Tracer>`,
defaulting to NoopTracer (zero cost). Test contexts overwrite this
field with a JsonlTracer after construction.
Verify:
- cargo check -p buzz-relay green
- cargo test -p buzz-relay --lib: 378/378 (no regressions)
- cargo test -p buzz-conformance: 9/9 (checker still bites all four
failure modes)
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Two unit tests in `crates/buzz-relay/src/conformance/mod.rs` that prove the structural fail-closed property of the [`EmitGuard`]: - `emit_guard_drop_records_exactly_one_impl_bug_when_no_emit` — drop the guard with zero recorded steps on the returned counting tracer → exactly one `ImplBug` step lands on the inner tracer, carrying the seam-name string passed to `EmitGuard::arm`. - `emit_guard_drop_is_silent_when_an_emit_reached_the_tracer` — record at least one step through the counting wrapper → Drop emits no `ImplBug`, only the original step. These pin the counting-tracer design against a future refactor to a "disarm" flag, which would only fail-close by author discipline. The counting wrapper makes coverage breach fire *structurally* regardless of what the request path did or didn't do — that's the whole point of the coverage-breach mode. Verify: `cargo test -p buzz-relay --lib conformance::` → 2/2. Full `cargo test -p buzz-relay --lib -- --test-threads=1` → 387/0. Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Wire the conformance read seam at `req.rs` channel-membership
confirmation. When the relay falls through to the DB-uncached
membership check, record one `AuthCheck` step on the tracer
mapping `is_member` → `Allow`/`Deny`.
Design notes:
- `trace_state` is built once at request entry, after `pubkey_bytes`
is available. Reused by every downstream emit (matches ingest's
`state_for_request` discipline). The `Option` only goes `None`
on malformed pubkey bytes — a separate failure path.
- `claimed_community: None` is the load-bearing choice on the read
path: the REQ wire has NO client-asserted community (the `h` filter
is a channel-id, not a community-id). Encoding `None` here rather
than copying the resolved community means a future regression that
ever starts reading a wire-community on REQ would need to put a
real value in the field — that surfaces at code-review time
instead of silently projecting away the M2 (claim ≠ resolved) bite.
- No `EmitGuard` at REQ entry: read paths legitimately skip the DB
on cache hit (no `is_member` call), so a coverage-breach guard
would false-positive on the common case. Coverage for the read
seam comes from the upcoming row-emit fixtures, not from a guard
at the entry point.
Implementation:
- New `crate::conformance::record_req_authcheck` helper centralises
the emit so the call site stays one line and the helper carries
the design rationale in its doc comment.
- Two unit tests pin the verdict mapping table:
`record_req_authcheck_emits_allow_with_none_claim_when_member` and
`record_req_authcheck_emits_deny_when_not_member`. Mutate→red→
restore verified: inverting the `if member` branches reds both
tests with explicit panic messages ("member=true must map to
Allow", "member=false must map to Deny"); restored both green.
Test surfaces:
- `cargo test -p buzz-relay --lib` → 389/0 (was 387 baseline).
- `cargo test -p buzz-conformance` → 14/0 unchanged.
- `cargo clippy -p buzz-relay --all-targets -- -D warnings` clean.
- `cargo fmt --all -- --check` clean.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Land the read-seam emitter for the runtime conformance gate. Two
emit sites, one buzz-db helper, one negative fixture.
## buzz-db: `communities_of_channels` helper
`Buzz::communities_of_channels(&[Uuid]) -> HashMap<Uuid, CommunityId>`
— batched per-channel community lookup. Used by the relay emitters to
project each row's true community label independently of the fetch
query's WHERE clause. That independence is what makes the
`Inv_NonInterference` / `Inv_ReadConfinement` bite non-vacuous: a
mutation dropping `community_id = $X` from `query_events` would
still let this helper return the row's true label and the checker
would catch the mismatch.
Channels missing from the result map are intentionally NOT mapped to
a default — callers MUST treat "channel-id not in map" as a coverage
breach, never as "use the resolved community."
## buzz-relay: projection + record helpers
`crate::conformance` gains four new items:
- `project_row_community` — single-row helper encoding the (B)
strategy: channel-less → resolved (honest, not tautological);
channel-scoped → lookup or `None` (caller fails closed).
- `RowCommunityProjection` enum — Ok(Vec<CommunityLabel>) OR
MissingLookup discriminated outcome.
- `record_read_message_rows` — non-search lane: emits
`ReadMessageRows` on Ok projection, `ImplBug { kind:
"row_community_lookup_missing" }` on MissingLookup.
- `record_read_by_id_rows` — search lane companion, same shape but
emits `ReadByIdRows`. `filter_channel` is `None` for the search
lane (search at the abstract level isn't bound to a single channel;
per-row `channel_id` carries channel identity honestly).
## req.rs wire-up: two emit sites
- Site 1 (`req.rs` non-search loop, after `query_events`): collect
distinct channel ids from the result set → `communities_of_channels`
→ `record_read_message_rows`. Production cost: one extra DB query
per request (NoopTracer short-circuits in non-conformance builds).
- Site 2 (`req.rs` search loop, after `get_events_by_ids`): same
pattern → `record_read_by_id_rows`. `handle_search_req` gains a
threaded-through `trace_state: Option<&AbstractState>` parameter.
DB-helper errors on either site fall back to an empty lookup map.
This intentionally triggers `MissingLookup` → `ImplBug` for any
channel-scoped row in the result set, surfacing the helper failure
as a coverage breach (fail-closed) rather than a silent resolved-
label substitution.
## Negative fixture: foreign-row leak
New `bad_foreign_row_leak.jsonl` + matching test. The fixture is a
`ReadMessageRows` whose row_communities contains community B while
the state is bound to community A. This is the proof artifact Eva
requested for the (B)-projection guard-rail: if the row had been
mis-projected as channel-less (defaulting to resolved A), the subset
check would have passed vacuously. By recording the row's TRUE
community independently, `Inv_NonInterference` surfaces it
immediately as `NonInterference`.
## Unit tests (conformance::tests)
Five new tests pinning every behavior:
- `project_row_communities_channelless_uses_resolved` (positive)
- `project_row_communities_channel_scoped_uses_lookup_label` (the
non-tautological correctness — lookup label, NOT resolved)
- `project_row_communities_channel_scoped_missing_is_breach` (the
guard-rail bite)
- `record_read_message_rows_missing_lookup_emits_impl_bug`
- `record_read_by_id_rows_ok_emits_read_by_id_rows`
## Mutate → red → restore (three independent bites)
1. Make `project_row_community` fall back to resolved on missing-
lookup (the tempting wrong-fix): `project_row_communities_channel_
scoped_missing_is_breach` + `record_read_message_rows_missing_
lookup_emits_impl_bug` go red with explicit messages
("missing lookup must be a breach, got Ok([...])", "expected
ImplBug coverage breach, got ReadMessageRows {...}"). Restored.
2. Make every channel-scoped row project to resolved (the
tautological projection): 4 of 9 unit tests go red — the lookup-
label, missing-breach, and record-helper tests all bite. Restored.
3. Edit the negative fixture to use community_a instead of
community_b: `foreign_row_leak_is_non_interference` reds ("foreign
row community label must be rejected by Inv_NonInterference"). The
fixture is load-bearing, not decorative. Restored.
## Test surfaces
- `cargo test -p buzz-relay --lib` → **394/0** (was 387 baseline).
- `cargo test -p buzz-conformance --lib` → 9/0.
- `cargo test -p buzz-conformance --test replay_fixtures` → **6/0**
(was 5; +foreign_row_leak).
- `cargo test -p buzz-db --lib` → 75/0 (helper compiles; DB-driven
integration coverage lives in `--include-ignored` lane on PG).
- `cargo clippy -p buzz-relay -p buzz-db -p buzz-conformance
--all-targets -- -D warnings` clean.
- `cargo fmt --all -- --check` clean.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
…tract
The relay-side read-row emitter relies on a load-bearing contract from
`Db::communities_of_channels`: a channel id with no row in the DB MUST
be absent from the returned map, never mapped to a default. The relay's
`MissingLookup → ImplBug{row_community_lookup_missing} → CoverageBreach`
fail-closed guard-rail goes blind if this helper ever started returning
a default/zero entry for unknown channels — and the relay-side
mutate-bite for that guard-rail wouldn't catch it (different layer).
This adds a PG-ignored test that pins both directions:
- (1) Existing channel → present with its true community.
- (2) Missing channel → ABSENT from the result map (load-bearing).
- (3) Map size equals the number of existing channels.
Mutate → red → restore verified against live Postgres:
Mutant: post-loop `for ch in channel_ids { entry().or_insert(nil) }`
Result: assertion (2) bites with explicit message
"missing channel must be absent from the result map,
got Some(CommunityId(00000000-…))"
Restored: green.
Closes the read-seam fail-closed chain end-to-end (DB layer through
checker), making it non-vacuous across both layers.
Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
Adds two regression tests against Max's NIP-98 replay-guard wiring (b30869d), proving distinct properties beyond his existing cross-pod test: 1. nip98_replay_guard_rejects_same_pod_same_community_replay (#[ignore], requires Redis) — single guard instance, A1 then A2 with the same TenantContext rejects the second call. Guards against a fix that accidentally weakens same-pod replay detection when moka is replaced with the shared Redis seen-set. 2. nip98_replay_check_fails_closed_when_guard_errors (CI-runnable, no Redis) — injects a stub Nip98ReplayGuard that always returns Err(AuthError::Internal(_)); asserts check_nip98_replay_with_guard maps it to (401, error="NIP-98: replay check unavailable"). Exercises the Err => arm of the match (bridge.rs:107-117) which is otherwise untested. This is the load-bearing fail-closed property: a stateless worker that loses Redis MUST reject rather than admit (Nip98ReplayGuard trait contract, buzz-auth/src/nip98_replay.rs:70-73). Mutate→red→restore (two orthogonal bites, both verified): - Mutate Ok(false) => Err(...) → Ok(()) at bridge.rs:103. Reds Max's cross_pod test AND same_pod_replay; fail_closed stays green (doesn't exercise this arm). Restored. - Mutate Err(e) => Err(api_error(...)) → Ok(()) at bridge.rs:113. Reds ONLY fail_closed; cross_pod and same_pod stay green (don't trigger Err). Restored. Distinct mutations bite distinct tests — proves the new properties are load-bearing, not vacuous, and independent of Max's coverage. Verification on this commit: - cargo fmt --all -- --check ✅ - cargo clippy -p buzz-relay --all-targets -- -D warnings ✅ - REDIS_URL=… cargo test -p buzz-relay --lib \ api::bridge::tests::nip98_replay -- --include-ignored \ --test-threads=1 → 3/0 ✅ (Max's + 2 new) - cargo test -p buzz-relay -- --test-threads=1 → 390/0 + 2 ignored ✅ (baseline 389/1 ignored; +1 passing fail-closed, +1 ignored same-pod) Co-authored-by: Tyler Longwell <tlongwell@block.xyz> Signed-off-by: Tyler Longwell <tlongwell@block.xyz> (cherry picked from commit 64a94be)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Multi-tenant Buzz relay —
community_idas a first-class, server-resolved keyMakes
community_ida first-class, server-resolved key on every scoped row: the relay derives a connection's tenant from durable data (the request host →communitiesrow), never from caller input, and threads a&TenantContextthrough every scoped DB read and every Redis publish. This is the foundation for hosting Buzz multi-tenant on shared infra with provable cross-community isolation, checked against the machine-proven safety spec landed in #1285.The fence
TenantContextcan only be minted on the host-resolution path (bind_community/ the reaper's per-rowTenantContext::resolved). Everywhere downstream takes&TenantContextby reference and reads it — nothing else constructs one from client input. Read-path caches takeCommunityId; write/invalidate publishers take&TenantContext(the Redis topic key needs the host).What's in this PR (22 commits, path-partitioned lanes)
TenantContext+CommunityId, the server-resolved tenant fence.community_id-native schema,EventQuery::for_community, by-id/by-channel reads scoped, reaper RETURNING(community, host)per archived row, archived-identities composite-PK fix.buzz:{community}:channel:{channel}/:global), tenant-scoped topic refcounts.ChannelScopeenum closes the channel-less fence hole.audit_logDDL;NewAuditEntry.community_idwidened toCommunityId.relay_url(dev/CI reconciler, reminder scheduler); deployment-community cases with no connection tenant (git hook/finalize, workflow sink) resolve via the same seam.resolve_admin_tenantreadsRELAY_URLhost →lookup_community_by_host, fail-closed; membership-list publish, reconcile, and existence query scoped.reindex_kind0backfill binary (obsolete under Postgres FTS).Behavior changes called out for review
.well-known/nostr.jsonis now host-bound (was single-tenant offconfig.relay_url): binds community from the request Host header, falls through to empty{names,relays}on an unmapped host.BUZZ_RECONCILE_CHANNELSreconciles the configured community only (dev/CI single-community). In a multi-community deploy the safe failure mode is incomplete reconcile, never cross-tenant access.Verification
cargo check --workspacegreen.buzz-db97/97,buzz-audit13/13,buzz-relay375 + main 1 (--include-ignored --test-threads=1against local PG),buzz-admincompiles.cargo fmt --checkclean;buzz-adminclippy clean.audit_records_caller_actor_not_relay_signer_for_relay_signed_event— proven to bite by reverting the fix (recorded the relay signer instead of the caller actor; restored, green).Provenance
Every commit is author == committer ==
Signed-off-by=tlongwell-block <…@users.noreply.github.com>(DCO-clean), with lane agent credit inCo-authored-by.Based on the #1285 safety floor (
main@2ecdcce7b). Supersedes #1259 (Typesense removal folded in here).